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ABSTRACT 

Future versions of Java will include support for parametric poly- 
morphism, or generic classes. This will bring many benefits to Java 
programmers, not least because current Java practise makes heavy 
use of pseudo-generic classes. Such classes (for example, those 
in package java.util) have logically generic specifications and 
documentation, but the type system cannot prove their patterns of 
use to be safe. 

This work aims to solve the problem of automatic translation of 
Java source code into Generic Java (GJ) source code. We present 
two algorithms that together can be used to translate automatically 
a Java source program into a semantically-equivalent GJ program 
with generic types. 

The first algorithm infers a candidate generalisation for any class, 
based on the methods of that class in isolation. The second algo- 
rithm analyses the whole program; it determines a precise paramet- 
ric type for every value in the program. Optionally, it also refines 
the generalisations produced by the first analysis as required by the 
patterns of use of those classes in client code. 

1. INTRODUCTION 

The next release of the Java programming language [12] is an- 
ticipated to include support for generic types. Generic types (or 
parametric polymorphism [6]), which make it possible to write a 
class or function abstracted over the types of its arguments, are one 
of the most wished-for programming language features in the Java 
community — in fact, their inclusion has been the #1 request-for- 
enhancement (RFE) for many years [13]. 

In the absence of generic types, Java programmers have been 
writing and using pseudo-generic classes, such as those in package 
java.util, which are expressed in terms of Object. Clients of 
such classes widen all the actual parameters to methods, and must 
down-cast all the return values to the type at which the pseudo- 
generic class is 'instantiated' in a fragment of client code. This 
leads to three problems: 

1. The Possibility of Error: Java programmers often think in 
terms of generic types when using pseudo-generic classes. 
However, the Java type system is unable to prove that such 
objects are consistently used. This disparity allows the pro- 
grammer to write, inadvertently, type-correct Java source code 
that manipulates objects of pseudo-generic classes in a man- 
ner inconsistent with the desired truly-parametric type. A 
programmer's first indication of such an error is typically a 
run time exception due to a failing cast; compile time check- 



ing is preferable. 

2. An Incomplete Specification: The types in a Java program 
serve as a rather weak specification of the the behaviour of 
the program and the intention of the programmer. Generic 
types provide better documentation, and the type checker 
guarantees their accuracy. 

3. Lexical Complexity: The user must explicitly downcast the 
objects retrieved from pseudo-generic classes, leading to syn- 
tactic clutter. (The non-generic type declarations are short, 
however) 

Non-generic solutions to the problems (e.g., defining wrapper classes 
such as StringVector) are inconvenient and error prone. 

Currently, programmers who wish to take advantage of the ben- 
efits of Generic Java must translate their source code by hand; this 
process is time-consuming, tedious, and error-prone. We propose 
to automate the translation of existing Java source files into GJ, and 
of Java class-files to GJ class-files' . There are two parts to this task: 
adding type parameters to class definitions, and modifying uses of 
the classes to supply the type parameters. 

There are multiple solutions to this problem. Two trivial solu- 
tions are as follows. (1) Do not use generic types at all. Since GJ 
is a superset of Java, a valid Java program is a valid GJ program 
in which each type is a GJ "raw" type, which is not parameterised. 
(2) Use some set of generic types, but always instantiate them at 
their upper bounds, and insert casts exactly where they appear in 
the Java program. For example, create one type parameter for each 
instance of a type name in the class, and instantiate the class us- 
ing those actual types. Each of these two trivial solutions behaves 
exactly like the original program, but reaps none of the benefits of 
parametric polymorphism. 

Our goal is to produce a set of polymorphic class abstractions 
that capture exactly those aspects of each class that are actually 
used generically, and the set of the most specific valid instantiations 
of those types for each use in the given client code. This is the ideal 
generalisation that experienced Java programmers would agree is 
the preferred GJ type for a particular Java class in the context of an 
application. 

This paper presents two algorithms that together translate the 
source code of a Java program into source code for a semantically 
equivalent Generic Java (GJ) [4, 5] program. The first, parameteri- 
sation algorithm is an implementation- side analysis that infers both 



'GJ class-files are class-files containing Signature attributes for 
each generic or parametric type. 



which classes are inherently polymorphic and also a candidate set 
of type variables (and their bounds) over which each polymorphic 
class should be abstracted. The second, instantiation algorithm is a 
whole-program analysis that infers at what type clients instantiate 
the polymorphic classes. The instantiation analysis also refines the 
candidate type parameters of each class, based on client use of the 
code and on constraints inexpressible in GJ. Because Java and GJ 
treat primitive types identically and generic classes can only be ab- 
stracted over reference types, primitive types are largely irrelevant 
to this paper, so the algorithms ignore, or provide obvious default 
interpretations for, values of primitive types. 

We constrain ourselves to the confines of the GJ language rather 
than selecting or inventing a new language that permits easier in- 
ference or makes different tradeoffs. (For example, some other de- 
signs are arguably more powerful or expressive, but lack GJ's inte- 
gration with existing Java programs and virtual machines.) This de- 
cision sheds light on the GJ language design and makes our work of 
direct practical interest to Java programmers who wish to upgrade 
to the next version of the language. This paper uses the term GJ to 
refer to two closely related versions of Java with generic types [4, 
14]. The differences between these languages are not significant to 
this paper. 

We describe our analyses at the representation level of JVM byte- 
codes. This simplifies the treatment of a number of source lan- 
guage features, such as class-nesting, anonymous classes, gener- 
ated methods, special operators (e.g., + for String), re-use of lo- 
cal variables, etc. Additionally, it permits the analysis to be run on 
classes for which source is not available (GJ allows one to retrofit a 
generic type onto a pre-existing class file). Section 4 discusses how 
to map the results into the GJ source domain. 

Our algorithms are not guaranteed to produce a perfect result 
(which may depend on the intended use of the class in any event). 
However, the automated translation is guaranteed to be self-consistent 
and semantically equivalent and in typical cases (based on our hand 
simulation of dozens of potentially problematic classes) matches or 
is close to the desired goal. Programmers could interact with a tool 
to refine results (see Section 2.6) or make adjustments directly to 
the resulting code. 

The remainder of this paper is organised as follows. Section 2 
presents the analysis for determining how many type parameters a 
class definition should have, and section 2.6 explains how these re- 
sults can be refined. Sections 3 explains how to instantiate generic 
types at uses such as declarations; the approach is to generate (Sec- 
tion 3.4) and resolve (Section 3.6) instantiation constraints. Sec- 
tion 4 shows how the results enable a translation of the Java pro- 
gram into GJ. Section 5 reviews related work, and Section 6 con- 
cludes. 

2. PARAMETERISATION ANALYSIS 

This section describes the algorithm that obtains intrinsically 
(via implementation-side analysis of a single class in isolation) a 
candidate set of type parameters for each class in the program. 

The algorithm generates the most general possible type param- 
eter set: it introduces as many distinct type variables as possible 
such that the program still type-checks. Section 2.6 discusses ways 
to improve the results of this analysis, both by additional analysis 
and by programmer intervention. 

The algorithm is a dataflow analysis, and its aim is to determine 
a set of constraints between the type variables (and types) that will 
be used at each declaration (in our terminology, origin) in the trans- 
lated GJ program. The algorithm works by computing which ori- 
gins flow to each type variable. 

As a simple example, consider the class class Box (sometimes 



class Stack 
{ 

private ObjectD data = new Object[10]; 
private Int size = 0; 
Object topO { 

return data[slze-l] ; 
} 
Object popO { 

return data[ — size]; 
} 
void push(Object o) { 

data[slze++] = o; 
} 
void exchangeO { 

Object ol = popO, o2 = popO; 

push(ol) ; 

push(o2) ; 
} 
} 



Figure 1 : A simple polymorphic stack implementation in Java. 

known as Cell): 

class Box { 

public void set (Object v) { thls.v = v; } 

public Object get() { return v; } 

private Object v; 
} 

The analysis identifies three type variables and their bounds: 

class Box<A extends B, B extends C, C extends Object> { 

public void set (A v) ; 

public C getO ; 

private B v; 
} 

This result is a valid GJ program, but it fails to capture the sim- 
ple generic type (with a single type parameter) intended by the pro- 
grammer. Our algorithms obtain the simpler type by examining 
uses of the class: either uses within the class itself (discovered by 
the parameterisation analysis, see Section 3.6.1) or external uses 
(discovered by the instantiation analysis, see Section 3. 

We illustrate this process with a running example, a very simple 
Stack class shown in Figure 1. 

2.1 Origins 

The parameterisation algorithm begins by identifying origins. 
The set of origins is a superset of the set of type parameters. The 
parameterisation algorithm determines subtype and equality con- 
straints among origins. 

To a first approximation, the set of origins is the set of places in 
a class's signature-body where a type-variable may legally appear 
in GJ: an origin is a declarator of a parameter-type or return-type in 
the signature of a non-static method, or a declarator of a non-static 
field type. For the purposes of this analysis, each class should be 
thought of as the result of 'flattening' everything inherited from 
its superclasses and interfaces into a single record containing all its 
fields, non-overridden methods, and methods accessible via super. 

There are additional origins for local variable declarations, array 
types, and array creation sites. This implies that the set of origins 
depends on the implementation as well as the signature of a class; 
see Section 2.3.3. For each origin A of array type, there is origin 
A' corresponding to the elements of origin A. (If A' is itself of 
array type, it gives rise to A", and so forth.) There is also an ori- 
gin for each occurrence of the new [] array-creation operator in the 
methods of the class. 



Number 


Name 


Declared type 


Ol 


Stack. data 


Object [] 


02 


Stack. data' 


Object 


03 


Stack . anewarray 1 


Object [] 


04 


Stack . anewarray 1 ' 


Object 


05 


Stack . top : : return 


Object 


06 


Stack . pop : : return 


Object 


07 


Stack. push: : 


Object 


08 


Stack. exchange: :ol 


Object 


09 


Stack . exchange : : o2 


Object 



Figure 2: Origins for the Staclc example of Figure 1. In origin 
names, the : : operator denotes 'in the scope of. Origins Ol and 
02 represent the expression new Object [10] that appears in the 
body of the (implicit) Stack constructor. Origin number have no 
semantic meaning, but are only used for presentation in this paper. 

NOVARIABLE 

{01,02,03 ) 

( 01,02) {01,03} {02, 03) 

{ 01 ) { 02 ) { 03 ) 




NULL 



Figure 3: Lattice for the parameterisation analysis. Between NO- 
VARIABLE and Null is P{{01 . . . On}), the power-set of the n 
origins, ordered by subset inclusion C. The value UNKNOWN in 
the dataflow rules maps to NULL in the lattice. 



Figure 2 shows the origins for class Stack. 

The following properties are defined for each origin: 

• javadecl(O) is the Java type associated with origin O; this is 
the Declared type column of Figure 2. 

• element(O) is the origin associated with the element type of 
0\ it is defined iff javadecl(O) is an array type. In the Stack 
example, element(Ol) = 02. 

The analysis makes use of the helper function origin(riame), 
which returns an origin given its name, which may be specified ei- 
ther by identifier (e.g., C . f : : o) or by abstract syntax (e.g., C . f : : return, 
C.f : :f ormali). 



the results. We distinguish UNKNOWN from NULL in the dataflow 
rules, even though they are the same in this lattice, because this 
permits a single set of dataflow rules to be used for both this analy- 
sis and an alternative one (not discussed in this paper) that eagerly 
fuses type variables. 

NoVariable is the top (T) of the lattice since it represents val- 
ues that cannot flow into variables whose type is given by a type- 
variable. In other words, such values stand for non-parameterised 
types. 

2.2.2 Abstract state 

Section 2.3 presents the transfer functions of the dataflow analy- 
sis as an alternative operational semantics for JVM bytecodes [16]. 
The abstract state of the JVM at each program point is represented 
as the triple State, defined as follows: 

State = {Stack x Locals x Origins) 

• Stack = Value* is a stack of abstract values, with the top- 
most element to the right; the invariant part is shown as '...', 
and the operands pushed and popped by that transfer function 
are named. 

• Locals — Value"""'-'°'^^'^ is a fixed-size array of local ab- 
stract variables. 

• Origins — Origin"""'-°"^'"^ is a fixed-size tuple of origins, 
one for each origin in the current class. We use the functional 
notation 0[x :— y] to represent the Origins tuple O with the 
slot indexed by x updated to contain y. (No join is necessary; 
the dataflow join operator takes care of that detail.) Analysis 
of each method generates an Origins tuple; the result of 
analysing the entire class is the join of all of these tuples. 

The State triple induces a cross-product lattice whose partial or- 
der relation is the pointwise application of the partial order relation 
of its three elements. In turn, the ordering relation for each of these 
three elements is the pointwise application of the partial order rela- 
tion specified by the value lattice (see Figure 3) to the elements of 
each of these sets. 

The well-formedness of the JVM program ensures that joins are 
well-defined; for example, all pairs of stacks compared are of the 
same height. 



2.2 Abstract values 

2.2.1 Abstract value lattice 

Each abstract value represents the types of values (more pre- 
cisely, sets of origins that declare values) that can flow to a given 
Java (stack or local) variable or to an origin. One could call this the 
set of reaching origins, by analogy with reaching definitions. 

The lattice L = {P(N) U {NoVariable, Null}, C) in Fig- 
ure 3 is the domain of abstract values in the analysis. 

Null is the bottom (_L) of the lattice since null values can flow 
into variables of any reference type. The UNKNOWN value indicat- 
ing no information about a value (e.g., because they are reached via 
pointers other than this) is also mapped to the NULL (_L) lattice 
value; thus, values for which we have no information do not affect 



2.3 Dataflow rules 

This section gives the dataflow rules — one per bytecode instruc- 
tion — for the parameterisation analysis. 

The dataflow analysis is applied to all instance methods of the 
class, including methods inherited from super-classes. Static meth- 
ods are omitted since they are outside the scope of a class's type 
variables. Native and abstract methods have no bodies, so the anal- 
ysis can do nothing with them; however. Section 2.5.1 describes a 
technique whereby constraints on the origins of such methods may 
be inferred. 

In each rule, M refers to the current method, and C to the current 
class, this is a synonym for M: :formalo. 

Figure 4 gives the results of the dataflow analysis for the Stack 
example. For brevity, the details of the abstract execution are not 
shown, but the annotations on the code show, for each source/sink 
origin pair connected by the dataflow solution, which line of source 
code generated it. 

• Entry: pseudo-rule for procedure-entry block 

n=> {[].[afgo,..., argn],(-L,...,-L)) 



class Stack 
{ 

private Object [] data = new Object [10]; // 03 -> 01 
private int size = 0; 
Object topO { 

return data[size-l] ; // 02 -> 05 
} 
Object popO { 

return data[ — size]; // 02 -> 06 
} 
void push(Object o) { 

data[size++] = o; // 07 -> 02 
} 
void exchangeO { 

Object ol = popO, // 06 -> 09 
o2 = popO ; // 06 -> 08 

push(ol); // 08 -> 07 

push(o2); // 09 -> 07 
} 
} 



Figure 4: Dataflow results for the Stack example of Figure 1. 

Vi G [0..n] : argi — {origin(M: :formali)} 
Recall that this is M: : formal o. 

• Return: procedure return 
{[... retva I], locals, 0) ':^" D 

Retain O [origin (M :: return) :— retva I] as the result of analysing 
this method; the Origins objects for each method are joined 
together to produce the result of the class analysis. 

• Invoke: method/constructor call. Calls to static methods 
are not parameterised; calls with a receiver of this have pa- 
rameters identical to those of C; and nothing is known about 
calls with any other receiver. 

{[. . . receiver? argi .. argn], locals, 0) 

invoke M / r ,, , , ,-^f\ 

^ ([. . . retval], locals, 0) 

if M' is static (no receiver), retval — NoVariable and O' = 
O. 

if receiver 7^ {origin(this)}, retval — Unknown and O' = 
O. 

if receiver = {origin(this)}: { 

retval — {origin(M': :return)} 

O' = 0\ii G [l..n] ; origin{\fl : :formali) := argi] 
} 

• New: object creation expressions 

([...], locals, O) "^' ([... NoVariable], locals, O) 

• NewArray: new arrays of primitive type 

([... count], locals, O) ^=> ([... NoVariable], locals, O) 



NoVariable], locals, O) 



• String: string hterals 

([...], locals, 0)"^=^°" 

• CheckCast 

{[... expr], locals, O) ^=> ([... Unknown], locals, O) 

• Null: the null literal 

([...], locals, O) ''°^""" {[..., Null], locals, O) 



• PutField: writes to instance fields 

([. . . receiver value], locals, O) =^ {[...], locals, O') 

if receiver = {origin(this)}, 0' — 0[{origin((F))} := value] 

otherwise, O' = O. 

• GetField: reads from instance fields 

([. . . receiver], locals, O) ^=> ([. . . value], locals, O) 
if receiver = {origin(this)}, value = {origin(F)} 
if receiver 7^ {origin(this)}, value = Unknown 



• PutStatic: writes to static fields 

putstatic S / r 



value], locals, O) 



, locals, O) 



• GetStatic: reads from static fields 

([...], locals, O) ^"'^^ ^ {[... NoVariable], locals, O) 

• ANewArray: new array of references 

/r 111 ^~,\ anewarrayn C ,r , , , ^-.\ 

{[. . . count], locals, C) ^=> {[. . . nj, locals, C) 
where n is the origin number of the anewarray operator. 

• AALOAD: load from array 

([. . . arrayref index], locals, O) =^ {[. . . value], locals, O) 

if arrayref ^ {NoVariable, Unknown}, 
value = {element(a)| a G arrayref} 

otherwise value — Unknown 



• AAStore: store into array 

([. . . arrayref index value], locals, O) ' 
if arrayref ^ {NoVariable, Unknown}, 



'([...], locals, O') 



O' = O [Va G arrayref, element(a) := value] 
otherwise, O' — O 

• StackManipulation: all stack and local-variable manip- 
ulation operations (e.g., dup, swap, push, pop, load, store) 
are defined as in the standard semantics. 

• Arithmetic : : all arithmetic operations simply pop and push 
the stack as required. All values pushed are primitive values, 
which are irrelevant to this analysis, so the _L lattice- value is 
used. 

• ControlFlow: all control flow operators simply pop the 
stack as required. Of course, they also define the control-flow 
graph as used by the dataflow infrastructure. 

Note the symmetry of the rules for procedure entry/return and 
method invocation (Entry/Return and INVOKE), for reading and 
writing to fields (PutField and GetField), and for indexing and 
storing to arrays (AALOAD and AASTORE). 

2. 3. 1 Following pointers 

The dataflow rules distinguish between the case where the pointer 
is this, and all other cases. Values obtained from non-this point- 
ers (even those of the same class as this) are UNKNOWN, because 
the intra-class dataflow analysis cannot determine the proper type 
parameter instantiations for their type variables. 

As an example, consider the following code: 



class C { 
Set s; 

Set fooKC c) 
return this 
} 

Set foo2(C c) 
return c . s ; 
} 
} 



// 01 




// 02, 


03 


// 04, 


05 



We have given the full GJ type, but our analysis is provided with 
unparameterised Java code and aims to determine the type param- 
eters. 

The return this . s statement on line 4 induces a widening 
from origin Ol (the declared type of s) to origin 02 (the return type 
of fool): in other words, from Set to Set. Because of the use of 
the this pointer, we know that the two Sets have identical type pa- 
rameter instantiations (though we do not yet know what that might 
be). The return c . s statement on line 7 induces a widening from 
origin 01 (the declared type of s) to origin 04 (the return type of 
f oo2). Since parameter c might be instantiated with different type 
parameters than those (to be) declared on line 1, this widening only 
makes sense if a substitution is performed, for instance, suppose 
that the final GJ code is 



class C<T> { 

Set<T> s; // 01 

Set<T> fool(C<T> c) { // 02, 03 

return this . s ; 
} 
Set<Number> f oo2(C<Number> c) { // 04, 05 

return c . s ; 
} 
} 



The widening Ol ^- 04 is sensible only under the substitution 
of Number for T in Ol (Set<T>). However, this substitution is 
not knowable by the parameterisation dataflow analysis: it knows 
neither the set of type- variables over which C and Set are parame- 
terisation, nor what type-expressions Pi are used to instantiate any 
given pointer. By contrast, the widening Ol <— 02 is sensible un- 
der the identity transformation; this transformation is known to be 
valid even though the instantiation of origins Ol and 02 are not 
yet known. 

To simplify the parameterisation dataflow analysis (and to keep 
it intraclass), and because the information is easily obtained by the 
instantiation analysis, the parameterisation analysis ignores uses of 
pointers other than this. 

2.3.2 Fixed-class expressions 

Expressions that return a fixed class, e.g., new C(), new P[n] 
(where P is primitive), and "foo", cannot be assigned to a vari- 
able declared with a type-variable. GJ only permits upper-bounds 
to be specified for type-variables, yet assignments from these ex- 
pressions all induce lower bound constraints. 

Therefore the transfer function for the operations NEW and STRING 
return the lattice T value, NoVariable. Any origin into which 
such expressions flow will not be declared with a type-variable. A 
similar GJ restriction applies to values obtained from static fields 
or data. 

2.3.3 Array creation 

The treatment of anewar r ay differs from that of new C ( ) , which 
cannot be assigned to variables declared with a type-variable. 

Consider the Stack example (Figure 1). If the types of values 
that flow into the elements of array data have an upper-bound 



of T (where T is a type-variable), then we would like to declare 
the field as T [] data. But then the assignment T [] data = new 
Ob j ect [10] would not be valid, since it does not represent a widen- 
ing. 

In GJ, if T is a type-variable, we cannot create a new class in- 
stance with new T(). However, we can create a new array in- 
stance with new T[10] that allows reading and writing of elements 
of class T — even though the created object is in fact of class 
Object [] . So, by giving origins to new [] nodes, we allow them 
to be used in assignments such as that to data. 

2.4 The candidate parameterisation 

The solution to the dataflow problem is obtained in the usual 
way: forward-flow iteration to least-fixed point, using a single- 
entry, single-exit flowgraph. For each method, the dataflow equa- 
tions give an Origins component that associates each origin in the 
class with a lattice value. The value indicates, for each origin, what 
set of source origins may possibly flow into it by a series of assign- 
ments. The dataflow solution for the entire class is the elementwise 
join of the solutions for each method. 

Given this dataflow solution, our aim is to select a set of type 
parameters and their (upper) bounds. The set of type parameters is 
certainly no more than the number of origins. This section shows 
how to select a subset of the origins, how to select bounds, and 
which origins to relate to each selected one. 

If Origins[0] is NoVariable, then origin O cannot be de- 
clared with a type-variable; the origin is unchanged in the GJ trans- 
lation. 

If Origins[0] is NULL, then no values from origins in the cur- 
rent class flow to O, so we have no constraints on O. In the GJ 
translation, O is replaced by a new type- variable bounded by Ob j ect. 

The remainder of this section describes how type variables and 
their bounds are selected for origins into which other origins flow. 
The analysis consists of four steps. First, a graph of type constraints 
is created from the source code plus the dataflow solution. Second, 
the graph is augmented so that array types and element types are 
treated consistently. Third, the graph is simplified. Fourth, type 
variables that would represent array types are removed (GJ forbids 
parameterising over them). Finally, the set of candidate type vari- 
ables and their bounds can be read directly form the graph: each 
node is a type variable (or a Java class) and each edge from a type 
variable is an upper bound on that variable. 

2.4. 1 The graph of type constraints 

The analysis operates over a graph G of type constraints. The 
nodes of the graph are all the classes in the system, plus the ori- 
gins of the current class. The edges represent type constraints: they 
are the Java extends and implements relations, plus additional con- 
straints due to dataflow (assignments) and bounds. 



G = [Classes U Origins, E) 

E = extends U implements U flows U bounds 

flows = {[o,Origins[o\) \ o G Origin } 

bounds = {(o, javadecl(o)) | o G Origin } 

Origin = {o | Origins[o\ ^ {NoVariable, Null}} 

Origin' contains the origins with lattice values that are sets; ori- 
gins with lattice values of NoVariable or NULL were dealt with 
above. 

2.4.2 Consistent treatment of arrays 




Figure 5: Constraint graph generated by analysis of the Stack class 
of Figure 1. Circles denote origin nodes; boxes are class nodes rep- 
resenting the classes of the program. The edges are a combination 
of those arising from the flow analysis and those from the Java in- 
heritance graph. 




Figure 6: Constraint graph for Stack example after reduction step. 
Boxes are class nodes indicating whose SCC contains at least one 
Java class; elliptical nodes are type-variable nodes whose SCCs 
contain only origins. The grey arrows represent the element rela- 
tion. 



Java arrays have covariant subtyping. Therefore, for every di- 
rected edge between two array origins in the constraint graph, a 
corresponding edge must exist between their respective element 
origins (and so on, in the case of multidimensional arrays); sim- 
ilarly, each edge induces induces an edge between elements repre- 
senting arrays of the types connected by the edge. In the Stack 
example, this process adds edge 04 -^ 02 due to existing edge 
03 -^ Ol. Figure 5 shows the constraint graph for the Stack 
example. 

2.4.3 Graph simplification 

The next step is SCC-merging, local variable elimination, and 
transitive reduction. This step fuses all the nodes in each strongly 
connected component and fuses each node containing only local 
variable origins with its least restrictive bound (lub). Finally, it 
removes the maximum number of edges possible while maintaining 
the partial-order relation. Figure 6 shows the reduced graph for the 
Stack example; no local variable elimination was necessary for 
this example. 

Each SCC contains at most one Java class node. In the GJ trans- 
lation of the input program, any origins that share a SCC with a Java 
class are cannot be represented by a type- variable, but are translated 
to the Java class. 

2.4.4 Eliminate variables bounded by a final class 
In GJ as in Java, one cannot extend a class declared final, so if 

any type-variable has such a class as one of its upper-bounds, then 
we eliminate that variable by fusing it with the bound class. 



In practise, programmers often forget to annotate classes as final, 
so we find the principal benefit of this comes from eliminating vari- 
ables F < String{). 

2.4.5 Eliminate OhjectU bounded variables 

GJ does not permit the bound of a type- variable to be a subclass 
of Ob j ect [] — since there would be no way to refer to the element 
type of such a type-variable. Therefore we eliminate each such 
variable as follows: 

1. Colour grey all nodes labeled with a Java class derived from 
Db j ect [] ; leave all other nodes white. 

2. Select any white node A'^ from which there is an edge to a 
grey node. If there are none, stop. 

3. Let O be the set of origins in the SCC associated with node 
A'^. Define E to be the node representing the element type of 
node A'^: 

£ = M element(o) 
oeo 

4. Rename node N to EO. Color it grey. Go to step 2. 

Figure 7 illustrates this process for the Stack example. First 
node A is renamed B [] , then C is renamed D [] . 

2.4.6 Final solution 

Now we can read the solution from the graph. The solution con- 
sists of seven type-expressions for the origin declarators and three 
type-variable bounds: 



Number 


Type expression 


Ol 


B[] 


02 


B 


03 


D[] 


04 


D 


05 


E 


06 


B 


07 


B 


08 


B 


09 


B 



E < Object 
B < E 
D< B 



This is all the information we would need to emit the parameterised 
class signature for Stack: 

class Stack<E, B extends E, D extends B> 
{ 

private B[] data; 

private int size; 

E topO ; 

B popO; 

void push(B o) ; 

void exchange (); 
} 

However, the output of this step is the set of type expressions, 
possibly containing bounded type-variables, for all origins in the 
table above, because the instantiation analysis of Section 3 requires 
information about local and array origins, not just about the class's 
signature. 






Figure 7: Elimination of array-bounded type variables in the Stack example. GJ does not permit the bounds of a type-variable to be an 
array-type. Such variables are replaced by E [] , where E is the least-upper-bound of their elements, (a) A is to replaced by B [] . (b) C is to be 
replaced by D [] . (c) All array-bounded variables have been eliminated. 



2.5 Inheritance 

The parameterisation analysis as described so far is applied to 
each class in the program in isolation. This section describes two 
techniques for combining per-class results to produce more precise 
results. The first ensures that inherited and overridden members 
have consistent types, and the second determines type instantiations 
for extends clauses. 

2.5. 1 Compatible inherited and overridden types 

In Java and in GJ, an overriding method must have identical 
formal parameter types as the overridden method. We describe a 
post-processing procedure to enforce this property. (An efficient 
implementation can combine this procedure, and also the 'flatten- 
ing' pre-processing step of Section 2.1, with the main analysis, by 
analysing classes in depth-first pre-order, caching a stack of results 
obtained for superclasses.) 

In this description (as in the rest of this paper), we do not distin- 
guish between classes, abstract classes, and interfaces. 

For each class C in the system (in topological order), we ex- 
amine in turn each class D that inherits from it, either directly or 
transitively, and we examine the set of all origins in D appearing 
in the signatures of methods present in both classes (i.e., inher- 
ited/overridden methods), and origins for fields of C (which are 
inherited by D). 

Let Ec,D be the set of unordered pairs of origins of C whose cor- 
responding origins in D belong to the same type variable as each 
other. (With E we thus revisit the equivalence relation among ori- 
gins that gave rise to the type variables.) Let Ec = nD<c ^c,d, 
i.e., Ec is the set of origin-pairs of C that always belong to the 
same type variable in all subclasses of C. 

Then, we ensure that the origins Oa, Ot for each {Oa, Ob) G 
Ec belong to the same variable, fusing variables where necessary. 

We will demonstrate this with an example: 

class Abstract<A,B,C,D,E> 
{ 

A f(B x); 

C g(D x); 

E h; 
} 

class ConcreteOne<F,G> extends Abstract 
{ 

F f (F x) { ... } 

G g(G x) { ... } 

G h; // (inherited) 
} 

class ConcreteTwo extends Abstract 
{ 

String f (String b) { ... } 



} 



String g(String d) { . . . } 
String h; // (inherited) 



Class Abstract has no method bodies, so no constraints are 
generated and each of the five origins has a different type variable. 
There are two concrete subclasses, each with different generalisa- 
tions of the two inherited methods and the inherited field. 

Numbering the origins 1-5 in order, and abbreviating the class 
names, we get: 

Ea,ci = Patrs{{l, 2}) U Patrs{{3, 4, 5}) 
EA,C2 = Pairs{{l,...,5}) 
and so : 
Ea = Pairs{{l, 2}) U Patrs{{3, 4, 5}) 

where Pairs{S) = {{a, b)\a, b £ S,a < b} 

We conclude that, in all subclasses of Abstract, the variables 
A and B are instantiated at the same type, as are the variables C, 
D and E. Therefore, we fuse the variables in each set, giving the 
following type for Abstract: 

class Abstract<A,C> 
{ 

A f(A x); 

C g(C x); 

C h; 
} 

2.5.2 Superclass instantiation 

After variable fusion, we can deduce the extends relation for 
both subclasses; that is, the type parameters to the superclass. De- 
fine Ti as the type expression with which the ith type-variable 
of class C is instantiated by subclass D in its extends-clause. 
For any origin in C declared with variable V, let Ti be the type- 
expression of a corresponding origin in class D. 

Continuing our example, we obtain the following extends-clauses 
for the concrete classes: 

class ConcreteOne<F,G> 

extends Abstract<F, G> { . . } 
class ConcreteTwo 

extends Abstract<String, String> { . . } 

This demonstrates how we can exploit patterns of use common 
to all subclasses present in the application to (1) infer precise pa- 
rameterisations for abstract methods; (2) reduce unnecessary gen- 
erality for all classes; and (3) deduce the extends relation which 
is required for the next analysis. 



Section 3.7 takes a similar approacii to eliminate unnecessary 
generality based on patterns of use common to all clients. 

2.5.3 Kap .Entry example 

Here is an example from the java.util package. This tech- 
nique generates the ideal result for interface j ava . util . Map . Entry, 
based on two of its subclasses, TreeMap . Entry and HashMap . Entry. 
In the absence of the subclasses, four distinct type- variables would 
have been produced for Map. Entry. (Results are truncated for 
brevity.) 

Interface Java. util. Map. Entry<A, B> 
{ 

A getKeyO ; 

B get Value () ; 

B setValue(B) ; 
} 
class Java. util. TreeMap. Entry<C, D, E, F, G> 

implements Java. util. Map. Entry<C, D> 
{ 

void Entry (C, D, G) ; 

C getKey() ; 

D get Value () ; 

D setValue(D) ; 

C key; D value; E left; F right; G parent; 
} 
class Java. util. HashMap. Entry <H, I, J> 

implements Java. util. Map. Entry<H, I> 
{ 

void Entry (. . , H, I, J) ; 

H getKey() ; 

I get Value () ; 

I setValue(I) ; 



} 



H key; I value; J next; 



2.6 Refining the Parameterisation 

The parameterisation analysis computes a candidate set of type 
variables and bounds for each class, but often the class is over- 
generalised. The instantiation analysis of section 3 can eliminate 
variables, but the results depend on exactly what constraints are be 
generated from the class's methods and the available client code. 

We believe that the quality of the results could be further im- 
proved improved with some advice from the user. The advice would 
take the form of directions as to which type variables are irrelevant 
or unnecessary to the design of the class, and should consequently 
be eliminated. 

We have begun implementation of a graphical tool that allows 
users to browse the class declarations, transformed to reflect the 
candidate parameterisations. 

The tool displays each class signature, highlighting the uses of 
different variables in distinct colours. It allows the user to fuse 
a pair of variables together, or to eliminate a variable (replacing 
each occurrence of by its upper-bound) by pointing and clicking. 
Each edit causes the tool to update the display to reflect the new 
parameterisation. 

The tool manages the type constraint graph, as described in sec- 
tion 2.4, iteratively adding constraints and re-solving in response 
to each user action. When the user is satisfied with the results, the 
parameterisation analysis is complete. 

3. INSTANTIATION ANALYSIS 

The parameterisation analysis of Section 2 determines a generic 
type for each class in isolation. This section presents the instan- 
tiation analysis that uses the results of the parameterisation anal- 



ysis to deduce a complete, parametric type for every value in the 
program. This information can then be used to direct a source-to- 
source translation, as described in Section 4. 

3.1 Overview 

The instantiation analysis determines a parametric type for every 
type expression that makes reference to a parametric class. This in- 
cludes those appearing in field and method declarations, bounds on 
type variables, extends clauses appearing in type- variable bounds, 
declarators of local variables, new expressions, and casts. 

The instantiation analysis consists of five steps. First, it adds 
unknowns to instantiation sites (Section 3.3). Second, it gener- 
ates a type constraint from each generalised assignment in the pro- 
gram (Section 3.4) via a one-pass whole-program static analysis. 
Third, it transforms some of the constraints to a more tractable 
form. Fourth, it solves the constraint resolution problem (Sec- 
tion 3.6), possibly performing more transformations as appropri- 
ate. Fifth, it optionally simplifies the results of the parameterisation 
analysis by eliminating unnecessary type parameters (Section 3.7). 

The solution to the set of constraints over the unknowns gives 
concrete values (type expressions) to each of the unknowns. As 
noted in Section 1, this constraint system is guaranteed to have a 
solution. Our goal is to select the most specific possible instantia- 
tion type for each unknown parameter. 

3.2 Example 1: Stack 

Before presenting the five parts of the analysis in turn, we il- 
lustrate and motivate it via our running example of a Stack class, 
augmented by some client code (Figure 8). The instantiation anal- 
ysis applies to the whole program at once; however, the only other 
classes in this program are String and Object, which take no pa- 
rameters, hence we omit them. We indicate a parameterless class 
by StringO to distinguish it from the GJ raw type String. 

First, the analysis annotates each class to reflect the result ob- 
tained from the parameterisation analysis of that class (see Sec- 
tion 2) and annotates all references to any class with a set of fresh 
unknowns, one for each variable on that class. Figure 8 shows the 
annotated Stack code. 

Second, generalised assignments, declarations, and casts of the 
program induce type constraints. 

Third, the type constraints are transformed and simplified. The 
constraints of the Stack program (annotated with their originating 
line numbers, and with trivial constraints omitted) are: 



[L21] Stack(#l,#2,#3). 

[L21,l] ObjectO 

[L21,l] [#1,#2,#3/E,B,D]E- 

[L21,l] [#1,#2,#S/E,B,D]B- 

[L22] [#1,#2,#3/E,B,D]B- 



Stack(#4,#5,#6) 

[#1,#2,#3/£,B,Z?]£ 

m,#2,#3/E,B,D]B 

[#1,#2,#3/E,B,D]D 

StringO 



where < — is a widening type constraint. Note that [#1, #2, #3/-E, B, D] 
represents the substitution caused by following the stk pointer. 
Fourth, these constraints simplify to: 



ObjectO 



#1 = #4 
#3 =#6 
#1^#2^#3- 



StringO 



For each unknown with a lower bound and only the trivial ObjectO 
upper bound, we instantiate the unknown to its lower bound, repeat- 
ing the process until no further progress is made. This instantiates 
#3, #2, and #1 (in that order) to StringO, giving us the result: 

20: StringO test (StringO str) { 

21: Stack<String<>, StringO, String<» stk = 



10 

11 

12 
13 
14 
15 
16 
17 
18 
19 

20 
21 
22 
23 

24 



class Stack<E extends Objecto, B extends E, D extends B> 1 
{ 

private B[] data = new D[10]; 
private int size = 0; 
E topO { 

return data[size-l] ; 
} 
B popO { 

return data[ — size] ; 
} 
void push(B o) { 

data[size++] = o; 
} 
void exchange { 

B ol = popO, o2 = popO ; 

push(ol) ; 

push(o2) ; 
} 
} 



StringO test(String<> str) { 

Stack<#l, #2, #3> stk = new Stack<#4, #5, 

stk.push(str) ; 

return (StringO) stk. top () ; 

} 



#6>(); 
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ass Stack<E extends Object<» 
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private E[] data = new E[10]; 


4 




private int size = 0; 


5 




E topO { 


6 




return data[size-l] ; 


7 




} 


8 




E popO { 


9 




return data[ — size]; 


10 




} 


11 




void push(E o) { 


12 




data[size++] = o; 


13 




} 


14 




void exchange -[ 


15 




E ol = popO, o2 = popO; 


16 




push(ol) ; 


17 




push(o2) ; 


18 




} 


19 


} 




20 


St 


ringO test (StringO str) i 


21 




Stack<String> stk = new Stack<String>() 


22 




stk.push(str) ; 


23 




return stk. top () ; 


24 


} 





Figure 8: Stack example, annotated to reflect the parameterisa- 
tion analysis of Section 2, plus calling code, annotated with fresh 
unknowns for each type instantiation. 
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23 

24 



new Stack<String<>, StringO, String<»(); 
stk.push(str) ; 



} 



return (String<>)stk. topO ; 



Fifth, we observe that in this program, for any instantiation of 
class Stack in the whole program, the type-expressions for each 
parameter position are equal. Therefore, the three variables can be 
fused into one. Figure 9 shows the final result. 

The following sections explain each of the above steps in more 
detail. 

3.3 Insertion of type parameters 

The first step is to annotate the program using the results of the 
parameterisation analysis. For each class, the result includes a set 
of upper-bounded type variables (Vi < Bi, . . . ,V„ < Bn), and 
a mapping from origins to declaration type-expressions, possibly 
containing variables. 

The annotation first adds type variables to class declarations and 
replaces origins by the type variables. Then, it annotates client 
code. 

Every reference to a class identifier C (whether in a declara- 
tion, new-expression, cast, etc.) is augmented by a fresh set of un- 
knowns, one per type variable on class C. 

Types appearing in extends and implements clauses are treated 
in a similar way. The parameterisation analysis is often able to de- 
termine some of the type-expressions appearing as parameters (see 
Section 2.5), because they are constants (such as String) or are a 
function of the variables exported from the extending class. 

3.4 Constraint generation 

We define generalised assignment as the propagation of (unmod- 
ified) reference values from one variable or expression to another 
location — anywhere that a reference expression is assignment con- 
verted or method-invocation converted according to the rules of the 
Java language [12]. Examples include ordinary assignment, param- 



Figure 9: Final GJ code for the Stack example and calling code. 



eter passing, returning a value from a method, assigning or reading 
a field or array element, etc. 

At each generalised assignment, there is the potential for a widen- 
ing reference conversion. Therefore, in a GJ program, a generalised 
assignment indicates a subtype constraint between the types of its 
source expression and destination location. 

The constraint generation step generates constraints from three 
sources. Each instance of a generalised assignment anywhere in 
the whole program generates a constraint of the form 

typeof(/ocn) +^ typeof(ea::pr) 

where typeof (a:) is the type of the expression or location x (fig- 
ure 10). The constraint is read as "expr can be assigned to locn". 
Each type-variable bound also generates a generalised-assignment 
constraint +^, since an expression whose type is given by a type- 
variable can be assigned to a variable declared by the bound type- 
expression. Finally, each cast operator (T) e generates a cast con- 
straint: 

T ^ typeof (e) 

Both constraint relations, « — and < — , define partial orders over 
parametric types (they are reflexive and transitive). The grammar 
of reference types (T) and of constraints (K) is: 

T ;:= ^n (unknowns) 

I C{Ti, . . . , Tn) (parametric classes) 

j V (type variables) 

|r[] (arrays) 

I P[ ] (primitive arrays) 



< 



(assignment) 

Ti ^ Ta (cast) 

\Ti=T2 (equality) 

For example, p.m(p.f , k) creates two constraints, one from 



X 


typeof(a;) 


null 


Null 


"string literal" 


StringO 


new C<Tl..Tn>() 


c(Ti,...,r„) 


new T[] 


T[] 


new P[] 


P[] 


ArrayElement a [x] 


element{typeof{a)) 


Local 1 


decltype(i) 


Field p. f 


[Ti,...,T„/Vi,...,K]F 


Method p . m ( . . . ) 


[ri,...,T„/yi,...,K]R 



Figure 10: Informal definition of typeof(a;). The definitions 
for p.f and p.m(...) assume p : C<Tl..Tn> and class 
C<V1. .Vn> { ... F f ; R M(. . .); ... }. 



field p.f to m's first parameter, one from local k to the second pa- 
rameter. 

Aside from constraints generated by generalised assignments and 
casts, we need to include bounds-constraints on unknowns (for ev- 
ery reference), and bound- constraints on type variables (for every 
class). The structures of these two kinds of constraint are paral- 
lel: one represents the bounds-constraints inside the class body, 
where type- variables are in scope, and the other represents the same 
constraints but externally, and thus we must apply the appropriate 
pointer substitution (see section 3.4.1) to the same constraint. Both 
kinds are subtype constraints, so we use the «-^- relation. For class 
C, and reference p: 

class C<B1 extends B2, 

B2 extends Number<» { . . } 

C<#1,#2> p = somefuncO; 

we obtain these bounds-constraints on the variables: 

NumberO <^ B2 <^ Bl 
and these on the unknowns: 

[#1,#2/B1,B2] (NumberO +^ B2 +^ Bl) 
i.e. NumberO 4^ #2 4^ #1 

3.4.1 Substitution 

In contrast to the parameterisation analysis, constraint generation 
follows pointers p, applying a transformation to the declared type 
of the entity referred to via the pointer 

The following of a pointer with a parametric type establishes a 
different type environment for the body of the class referred to by 
that pointer. lust as /3-reduction (function application) in the A- 
calculus causes bound variables to be replaced by operand expres- 
sions throughout the A-body, so in GJ do parametric instantiations 
cause type- variables to be replaced by type-parameter expressions 
throughout the class body. 

Figure 10 abbreviates the following field rule and a similar one 
for methods: 

rhp:C(ri,...,r„) 

rhC:classC(Vi,...,K){...F/;...} 

Fhp.f :[Vi/ri,...,v„/r„]F 

For example, in the following fragment of GJ code, the type of 
expression p . V is given by [StringO, IntegerO/K, V\ Vector (V). 
This type-expression can be simplified to Vector(lntegerO), hence 
the assignment to x is valid. 



class MyMap<K,V> { 

Vector<V> v; 
} 
void f (MyMap<String<> , Integer<» p) { 

Vector<lnteger<>> x = p.v; 

// p.v : [StringO, Integer<>/K,V] Vector<V> 

// : Vector<lnteger<» 
} 

3.5 Constraint-set augmentation 

The set of constraints generated in Section 3.4 is augmented 
to ease their solution. Some cast constraints give rise to gener- 
alised assignment constraints, and some generalised assignment 
constraints give rise to equality constraints. 

3.5.1 Casts 

Not all casts should influence the final type parameterisation and 
instantiation. While some casts are guaranteed to succeed regard- 
less of calling context, others depend on application invariants be- 
yond the scope of this (or any) analysis. 

If we cannot deduce statically that a cast is redundant, then we 
cannot assume that the parametric type of the cast is a subclass of 
the cast operand type — it could be an unrelated type. For example, 
the following Java program: 



class D extends C { ... } 

void f(C cl) { 

Vector V = new VectorO ; 

v.setCO, new DO) ; 

C c2 = (C)v.get(O); // guaranteed to succeed 

D d = (D)cl; // depends on application 

} 



has this intermediate-GJ representation during the instantiation anal- 
ysis: 



class D<V> extends C<String> { . . . } 

void f(C<#l> cl) { 

Vector<#2> v = new Vector<#3>(); 

V.setCO, new D<#4>()); 

C<#5> c2 = (C<#6>)v.get(0); 

D<#7> d = (D<#8>)cl; 
} 



Let's assume that during constraint resolution, it becomes clear 
that the type of what is returned from get is the same as what is 
passed to set, i.e., typeof(v.get(0)) = #2 = #3 = I>{#4). 

Then the cast on line 6 (which generates the cast constraint C{#6) ^ 
D{#4)) is a widening that need not appear in the translated GJ pro- 
gram. Thus, we can convert the cast constraint into the assignment 
constraint C{#6) <-^ D(#4), which gives us #6 = StringO (see 
Section 3.5.2). 

The cast on line 7, however, from C(#l) to Z)(#8) cannot be 
eliminated by GJ, and thus we cannot conclude anything about the 
value of #8 from the cast. 

Therefore we can only draw the following limited conclusion 
from a cast constraint. A constraint C(Ti, . . . , r„) *~^ Te may 
be converted to the constraint C{Ti, . . . ,T„} < — Te if and only 
if constraint resolution (Section 3.6) has identified (fused) Te with 
some parametric type D{Ui, . . . , Um) where D < C. 

The reasoning behind this is as follows. If the cast operand type 
D is a subtype of the cast type C, then the cast is trivial, made 
redundant by the type-system of GJ. This happens with casts in- 
serted by the programmer using pseudo-generic Java classes. A 
trivial cast is, in effect, an assignment conversion — just like any 
other assignment. 



10 



3.5.2 Congruence 

Some generalised assignment constraints give rise to equality 
constraints via a mechanism we call parameter congruence. A con- 
straint of the form: 



A{Ti,...,r„)^B{t/i 



,Um) 



implies (i) that B < Am the Java inheritance graph, and (ii) that 
B{Ui, . . . , Um) < A{Ti, . . . , r„) in GJ parametric type system. 

Let us first consider the (common) case in which B = ^, in 
which case m = n. We generate the equivalence constraints Tj = 
Ui, . . ., Tn = U„, because since GJ does not admit parameter- 
variance of parametric types, i.e.: 



C{Ti 



, r„) < c{Ui 



,Un) 



■y. 



U, 



Note that since each Ti = Ui is a constraint over type expres- 
sions, we unify them, possibly giving rise to additional constraints. 
For example, Pair(#3, F(#4)) <^ Pair(#5, #6) gives us #3 ee 
#5 and F(#4) = #6. 

In the case where A j^ B, we must consider the effect of a 
widening from B to ^ on the parametric type. In GJ, when a class 
extends a generic class, it may generalise or specialise — or both — 
the super class. Therefore some of the type-variables of the super 
class are also variables of the subclass while others are instantiated 
to a type-expression by the subclass. 

We define the function widen as follows: 

class B{Vi, ...,V„} extends A{Ui,. . . , Um) 
S^ = [T^,...,T„/V^,...,V„]U, 



widen(B(ri,...,T„),A) = (Si, 



, Om) 



This function returns the parameter tuple (Si , . . . , Sm) with which 
B{Ti, . . . , Tn) instantiates A, which the algorithm uses for gener- 
ating a set of pointwise equivalence constraints. 

So, for example, if class K{P, Q, R) extends J(-R, String()), then 
widen(J(ri,T2,T3), J) = (Ts, String()). 

3.6 Constraint resolution 

Constraint resolution can be viewed as an iterative graph-reduction 
process. Each type expression in each of the constraints represents 
a node in the graph. Each < — constraint is a directed edge; ^ 
constraints do not appear in the graph. 

The goal of constraint resolution is to find a set of assignments to 
the unknowns such that all the constraints are satisfied. There is at 
least one (trivial) solution, but in practise there are many solutions. 

It is convenient to consider the set of equivalence-classes of the 
unknowns. Initially, each unknown is in its own equivalence class, 
but as resolution proceeds, these classes are fused. After each con- 
straint resolution step, the constraint augmentation of Section 3.5 
is run; this can be done incrementally. 

3.6.1 SCC-merging 

The primary form of resolution, as in the parameterisation analy- 
sis, is SCC-merging. Wherever a cycle exists in the directed graph, 
all nodes lying on that cycle must be equivalent, so their equiva- 
lence classes are fused. 

3.6.2 Lower bounds 

SCC-merging (and constraint augmentation) may not put every 
unknown in an equivalence class with a concrete type (i.e. a type- 
expression containing no unknowns). 

So, when resolution can make no further progress, for each un- 
known #[/ whose equivalence class contains only other unknowns, 
and for which there is a defined greatest lower bound B, we push 
#f/ to that bound B. In other words, we use the most specific 



consistent set of instantiations. For example, if the only remaining 
constraint on ^3 was ^3 *^ C{#2), it would be replaced with 

#3 = C{#2). 

3.6.3 Upper bounds 

After pushing to lower bounds, any remaining unknowns that 
have no equivalent concrete type expression are pushed to their 
least upper bound. All unknowns have the upper bound Object(). 

3.7 Instantiation patterns analysis 

Section 2.5. 2's technique for determining extends-clauses looked 
for simple patterns in the instantiation expressions among all sub- 
classes of a class C. Now that we have computed the complete set 
of parametric references C{Ti, . . . , T„) — within C, all its sub- 
classes, and its clients — we can look for more subtle patterns. 

Consider a generic class C{Pi, . . . , P„). If there is some pair 
{Pi, Pj) of C's parameters, such that in all subclasses of C, the ith 
and J th type expressions with which C is instantiated are related by 
some function over types, then one of the variables may be elimi- 
nated, and replaced within class C by the appropriate function of 
the other type variable. 

For example, the following example shows class C and the com- 
plete set of (two) parametric references to it. Note that P and R are 
always instantiated with identical type expressions: 



class C<P, q, R> { 



} 



C<Strlng, Vector <String>, String> myrefl = ...; 
C<T, Vector<T>, T> myref2 = ...; 

Therefore, they can be fused, giving rise to this simplified result: 

class C<P, q> { .. } // [Q/R] in class body 

C<String, Vector<String>> myrefl = ...; 
C<T, Vector<T» myref2 = ...; 

So far, this case is very similar to the method mentioned already 
in section 2.5.2; indeed, redundant variables such as R would be 
found and eliminated by that analysis. But this technique can be 
generalised to handle cases where the pairs of parameter instantia- 
tion type-expressions are related by a relation other than equality. 
Continuing with our example, note that the instantiation expression 
for the second parameter is always a Vector of the first: 

class C<P> { . . } // [Vector<P>/Q] in class body 

C<Strlng> myrefl = . . . ; 
C<T> myref 2 = . . . ; 

Such patterns can be found by structural unification of corre- 
sponding elements in the parameter-tuples Ti , . . . , r„ of all refer- 
ences to a given class C. 

We have found in working through many examples that a sig- 
nificant number of unwanted type variables are always instantiated 
predictably, either with a constant expression or with some type- 
expression of the other parameters, and so eliminating such cases 
would improve accuracy. 

Applying such a refinement requires a new instantiation analysis- 
run to determine the new parameters for the smaller parameter sets. 

3.8 Overconstrained variables 

When a set of constraints places a lower bound on a type vari- 
able, we say the variable is overconstrained, and it must be elimi- 
nated. 
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For example, with the constraints Graph(#l,#2) ^^^ G +^ 
Graph{7^6, #7), where G is a type variable, G is overconstrained, 
because it has a lower bound of Graph. 

In this case, it is evident that G = Graph(#l = #6, #2 = #7). 
In other cases, we may not be able to infer a precise type for G, so 
we simply use the lower-bound. 

Firstly, we remove it from the class's variable list, replacing oc- 
currences of that variable within the class body with the bound on 
that variable. Then we remove the unknown in the corresponding 
parameter position from every reference to this class. 

3.9 Examples 

We finish our discussion of the instantiation analysis with addi- 
tional examples. 

3.9.1 Example 2: Map 
Assume the following source program: 



static Map testO { 

Map m = new HashMap ( ) ; 

m.put ("f oo" , new IntegerO)); 

m.putC'bar" , new Float (3.0)); 

return m; 
} 



and HashMap<K , V> extends Map<K,V> from the parameterisa- 
tion analysis, we annotate the code as follows: 



static Map<#l,#2> testO { 

Map<#3,#4> m = new HashMap<#5,#6>() ; 

m.put ("foo" , new Integer<>(3)) ; 

m.putC'bar", new Float<>(3.0) ) ; 

return m; 
} 



from which we generate the following constraints: 

Map(#3,#4) ^HashMap(#5,#6) 
Map(#l,#2)^Map{#3,#4) 

#3 ^StringO 

#4 «^ IntegerO 

#4^Float() 

From them, and the fact that method Map . put has type put (K , V) , 
we can conclude that #1 = #3 = #5 and #2 = #4 = #6, since 
the only assignment-compatible Map and HashMap types have the 
same parameter-sets (see Section 3.5.2) and that #4 +^ Number{), 
since Number is the join of Integer and Float. By pushing all 
remaining unknowns to their lower bounds (if any), and any un- 
knowns remaining after that to their upper bounds, we obtain the 
ideal type for this client code: 

1: static Map<String<>, Number <» testO { 
2: Map<String<>,Number<» m = 

new HashMap<Str ingO, Number <» ; 
3: m.putC'foo", new Integer<>(3) ) ; 

4: m.putC'bar", new Float<>(3. 0) ) ; 

5: return m; 

6: } 

3.9.2 Example 3: Graph 

The next example. Graph, is part of a directed graph class in 
our prototype Java-to-GJ translator. Its sccO method returns a 
Graph whose instantiation type is more complex than that of this. 
This example violates an assumption made by related work [9] that, 
within the methods of a class C, any reference of type G must be 
instantiated with the same type-parameters as this. 



(The idea behind sec is that it returns a Graph G', each node of 
which represents an SCC of the original graph G and is labeled with 
the Set of G' nodes in that SCC. Thus if G has type Graph(T), 
G' has type Graph{Set(Node(r))). For brevity, the sec method 
shown here does not actually compute the SCCs, but it merely has 
the same type as a method that would. Likewise, Set is simplified 
to contain a single object.) 

class Node { Object label; } 

class Set { Object value; } // Very small set! 

class Graph 
{ 

Set nodes = new Set(); 

void addNode (Object label) { 

Node n = new NodeO; 

n. label = label; 

nodes. value = n; // Add to 'set' 
} 

Graph scc() { 

Graph g = new GraphO ; // new graph has one node 

g. addNode (nodes) ; // labeled by set of old nodes 

return g; 

} 
} 

In this simple example, the parameterisation analysis gives each of 
the three classes one variable, bounded at Object(). 
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class Node<A extends Object<>> { A label; } 
class Set<B extends Object<» { B value; } 

class Graph<C extends Object<>> 
{ 

Set<#l> nodes = new Set<#2>(); 

void addNode (C label) { 

Node<#3> n = new Node<#4>(); 
n. label = label; 
nodes. value = n; 

} 

Graph<#5> scc() { 

Graph<#6> g = new Graph<#7>(); 

g. addNode (nodes) ; 

return g; 
} 
} 



The non-trivial generated constraints (with line numbers) are: 



Set(#l) . 
Node(#3) 

m/A]A = #3 . 

m/B]B = #1 . 

Graph(#6) 

[#6/G]G = #6 . 
Graph(#5) ■ 



Set(#2) [L6] 
Node(#4) [L9] 
G [LIO] 
Node(#3) [Lll] 
Graph(#7) [L15] 
Set(#l) [L16] 
Graph(#6) [L17] 



L6, L9, L15, and L17 induce equivalence constraints, and the re- 
mainder, after substitution, give a lower bound to each equivalence- 
class of unknowns: 



#5; 



#1: 
#3; 
#6; 



#2- 
#4. 
#7- 



Node(#3) 
G 

■ Set(#l) 
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Now we push all unknowns with lower bounds (all of them, in this 
case) to those bounds: 



#5; 



#1: 
#3; 
#6; 



#2: 
#4: 
#7i 



; Node(C) 
;C 

: Set{#l) 



The code below shows the result. We should point out that these 
constraints were generated only by looking at the class Graph itself, 
and no external client code. The result is the ideal type. Many 
classes' own methods generate sufficient constraints to obtain an 
ideal (or close to ideal) result, even in the absence of client code. 

class Node<A extends Object<>> { A label; } 
class Set<B extends Object<>> { B value; } 

class Graph<C extends Object<>> 
{ 

Set<Node<C» nodes = new Set<Node<C»() ; 

void addNodeCC label) { 

Node<C> n = new Node<C>(); 

n. label = label; 

nodes. value = n; 
} 

Graph<Set<Node<C»> scc() { 

Graph<Set<Node<C»> g = new Graph<Set<Node<C»>() ; 

g . addNode (nodes ) ; 

return g; 
} 
} 

3.9.3 Example 4: Violated genericity 

This section presents another example in order to indicate how 
the algorithm handles faulty user code. In this example, a client of 
a potentially-generic class violates the intrinsic generic-type invari- 
ants of the class by direct assigning the wrong class of object into 
a public field: 

class OpenBox { 

void set(Object v) { this.v = v; } 

Object v; // not private! 

Object getO { return v; } 
} 

The parameterisation analysis produces: 

class OpenBox<A extends B, B extends C, C extends Object<>> 

void set(A v) { this.v = v; } 

B v; 

C getO { return v; } 
} 

Now let us add the 'rogue' client code, which violates the class's 
encapsulation and writes directly to v: 

OpenBox b = new OpenBox () ; 

b.setC'f oo") ; 

String s = (String)b.get () ; 

b.v = new IntegerO); // wrong class! 

The first three lines appear to be using b in a manner consis- 
tent with OpenBox(String()). However, the assignment on line 5 
breaks this consistency. 

When we run the instantiation analysis and simplify the con- 
straints, we obtain: 



0penBox<#l,#2,#3> b = new OpenBox<#4,#5,#6>() ; 

b.setC'f oo") ; 

Stringo s = (String<>)b.get () ; 

b.v = new Integer<>(3); 



ObjectO 



m = 


E#4 


#2 = 


= #5 


#3 = 


E#6 


n*^ 


- #2 ^ #3 


n^ 


- IntegerO 


^3^ 


StringO 



StringO 



The dual lower bounds on #2 cause constraint resolution to yield 
the result #3 = String() A #2 = Object() ^ #1 = Object{), 
giving the following translation for the line 1 of the client code: 

OpenBox<Object .Object, String> b = 

new OpenBox<Object,Object ,String>() ; 

The result demonstrates that the algorithm does not rely on clients 
being well-behaved (with respect to encapsulation, etc) in order to 
give correct results. However, the cost of this 'rogue' use is that the 
class's inferred generic type is far from ideal. 

4. TRANSLATION OF JAVA TO GJ 

The instantiation analysis of Section 3 associates a parametric 
type with every declaration in the source program. The remaining 
step is to translate the program from Java to GJ. 

This translation can be effected at source level, at the class-file 
level (when source code is not available, e.g., for third-party li- 
braries), or at a user-specified combination of the two. We describe 
the two approaches below. 

4.1 Generating GJ source 

We described the analyses at the level of JVM bytecodes (and our 
prototype implementation operates at that level), but the analyses 
could also be performed on an AST (abstract syntax tree) instead. 
No matter how computed, the results can be applied to source-to- 
source translation so long as the class-files contain accurate Line 
Number and Local Variable tables. (The order of declarations dis- 
tinguishes multiple declarations that appear on a single line.) 

The source-to-source translating tool maintains whitespace and 
comments to the greatest extent possible. It replaces Java casts by 
GJ casts and omits redundant casts — those for which the target 
type of the cast is equal to, or is a superclass of, the inferred type 
of the expression. 

This paper does not discuss the additional required for the correct 
handling of certain constructs: top-level assignments of static fields 
and instance fields (which are implicitly moved into <clinit> and 
<init> methods respectively), generated default constructors, hid- 
den parameters (between outer and inner classes), the assert and 
. class constructs (which desugar to multiple basic blocks), etc. 
As one simple example, consider a declarator that introduces mul- 
tiple variables, for example Object ol = f(), o2 = gO;. If 
the two variables are given different GJ types by the analysis, then 
the declarator Ob j ect must be replaced by two distinct ones. 

4.1.1 Splitting local variables 

One possible optimisation is local splitting: static analysis of the 
flowgraph will show when a local variable is reused, i.e., it has two 
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or more disjoint live ranges. In sucli cases, we may want to split 
tlie variable into two, giving eacli live range a distinct name, and 
inferring the most precise type for each one [11]. Doing this before 
the flow-insensitive instantiation analysis can indicate when a Java 
programmer has reused a variable at a different type. 

In this example, the local Object o has two live ranges, the first 
as Integer, the second as String: 

Vector vil = new VectorO ; // Vector<Integer> 
vil.add(new Integer (3) ) ; 

Vector vsl = new VectorO ; // Vector<Strlng> 
vsl.addC'f oo") ; 

Object o; 

for (Int i=0; Kvil.sizeO; ++i) 

{ 

o = vil. get (i) ; 

System. out .println(o) ; 
} 

for (int i=0; i<vsl.size(); ++i) 
{ 

o = vsl. get (i) ; 

System. out .println(o) ; 
} 

Without local-splitting, the type of o would remain unchanged at 
Object, and the two casts would still be required in the translated 
GJ code. However, local splitting permits replacing the declaration 
of o by two new declarations, one inside the body of each loop. 
Integer ol and String o2. 

Local-splitting further complicates the treatment of declarators 
in the source-to-source translation, since it requires both the re- 
moval and addition of declarations, and potentially the renaming of 
references to variables. 

4.1.2 Inner classes 

In GJ, inner classes (i.e., non-static nested classes), are within 
the scope of the type variables of their corresponding outer classes. 
Ideally, we do not wish to duplicate the implicit passing of type 
parameters, just as in regular Java we do not wish to duplicate the 
implicit passing of value parameters from outer instances to inner 
instances. 

If we can observe that, for every allocation of the form outer, new 
C{Ti, . . . , Tn) for a given C, there are some parameters in com- 
mon between Ti and the type of the expression outer, then those 
parameters can be eliminated (by fusion with the outer class pa- 
rameter) and removed from the definition of, and every reference 
to, the inner class. 

4.2 Retrofitting class-files 

GJ is based upon type erasure; that is, the GJ compiler, after 
typechecking, discards all type variable annotations, inserts casts 
where required, and generates identical code to that of the equiv- 
alent Java program. Java classfiles have a dual role: they are the 
executable, and they provide signatures for separate compilation. 
GJ adds generic Signature attributes to the class-file to permit the 
GJ compiler to reconstruct the generic type of the class contained 
within it. 

The GJ retrofitter superimposes a generic type on an existing 
Java class-file (containing a pseudo-generic class). Users of third- 
party Java libraries who wish to work in GJ need not translate the 
library (which they might have no source for) by hand into GJ. They 
need only specify the generic type of the library's interface, a much 
smaller problem that requires no information of the library beyond 
that provided in the interface documentation. The retrofitter can 
then add the generic type to the library. 



Our work is complementary. Using our system, generic types 
can be inferred automatically for classes for which only class-files 
are available, and then retrofitted back into those class-files. 

There are some additional subtleties, such as the potential lack 
of debugging tables (local variable tables in particular). However 
techniques exist for computing a conservative set of declarations 
for locals [11]. 

5. RELATED WORK 

This paper presents a constraint-based generic type inference al- 
gorithm for the Java language. The most closely related work is 
other type inference algorithms that operate in the presence of poly- 
morphism. 

Milner [17] introduced the notion of polymorphic type inference, 
which is fundamental to the ML programming language. The orig- 
inal type checking algorithm did not address object-oriented pro- 
gramming languages with their type hierarchies and inheritance, 
but subsequent work [20, 22] extends Hindley-Milner typechecking 
to OO languages and to many other application domains. Polymor- 
phic type inference for object-oriented languages can be roughly 
categorized according to the task that it supports (reverse engineer- 
ing or optimization); the variety of polymorphism (data or para- 
metric) supported; the analysis technique (constraints or dataflow); 
and the type system (dynamic or static; explicit or inferred). We 
discuss each of these dimensions in turn before discussing the most 
closely related papers in greater detail. 

Analysis. Two basic approaches to type inference are constraint 
resolution [2, 9, 15, 26] and abstract interpretation [24]. Constraint 
resolution builds a set of constraints (such as equalities or inequal- 
ities) from the problem domain (the program text), then hands the 
constraints off to a resolution system that returns either a simpli- 
fied set of constraints or a specific solution (if only one exists). Our 
work uses constraint resolution, but since there are many solutions 
to our constraints, we must take care to produce the best solution 
among the possible ones. 

The alternative to constraint resolution is abstract interpretation [8, 
24, 23], typically implemented via dataflow. An abstract value (for 
instance, a set of possible run-time types) is flowed around the pro- 
gram, and each program operation affects the abstract value in a 
well-defined manner. 

Task. Polymorphic type inference aimed at reverse engineer- 
ing [25, 9] aims to broaden the applicability of a pre-existing com- 
ponent in order to permit it to be used in more situations. Code 
transformations either enable the broader applicability or provide 
compile-time type correctness guarantees. Our work fits in this 
category. Type inference for ML-style languages also arguably has 
primarily a software engineering goal, since its key purpose is early 
detection of errors that would otherwise persist until run time, if 
they were ever noticed at all. 

A more common type inference application is optimization. Three 
specific applications are statically discharging run-time casts [7, 
26], eliminating virtual dispatch [3], unboxing, and alias analy- 
sis [19, 18]. A high-precision context-sensitive abstract interpreta- 
tion [24] serves such an application well. The analysis determines 
a set of possible run-time types for each static operation in the pro- 
gram. If there is only one run-time type, then objects of only one 
class ever reach that program point, and any virtual method dis- 
patch can be inlined or converted into a static procedure call of the 
appropriate overriding implementation of the method. Likewise, 
if all run-time types that reach a check satisfy the check, then the 
check can be removed. 

Polymorphism variety. Polymorphism occurs in both paramet- 
ric (functional) and data varieties. Parametric polymorphism [22, 



14 



7, 21, 1, 2, 15, 26] refers to the ability of procedures to operate on 
arguments of arbitrary types, without caring what the specific type 
is; for example, length : list a — > int). Data polymorphism [9, 23] 
is the ability to store objects of different types in a variable or field. 
It is enabled by object-oriented subtypes or by dynamic typing [7]. 

C-l~l- classes and functions, which range over both primitive and 
class types, manifest parametric polymorphism, whereas inheri- 
tance enables data polymorphism. In Java, parametric polymor- 
phism (e.g., Java. util. Vector) is implemented in terms of the 
data polymorphism of the Object hierarchy. Even though the un- 
derlying JVM remains unchanged, GJ separates these functions 
conceptually by eliminating the need for explicit casts. Our work, 
which targets GJ, is most concerned with parametric polymorphism. 

Constraint-based analyses, like ours, tend to be most appropri- 
ate for detecting parametric polymorphism. Abstract interpretation 
deals well with data polymorphism, since the goal is to determine 
what types may appear in a particular variable. 

Type system. The language's type system affects the analysis 
that must be performed on it. In a dynamically or implicitly typed 
language [20, 22, 21, 15, 10, 7, 1, 2], data polymorphism is implicit 
and elimination of type checks is a major motivation. However, 
there is little room for standard type analysis. 

Statically typed languages take advantage of compile-time type 
checking. Type inference or reconstruction [17, 20, 15, 25] must be 
used even for explicitly typed languages, if the source types do not 
capture the analysis information; that is the case for our analysis. 
The Hindley-Milner algorithm was originally proposed to operate 
over equality constraints [17]. More recent work that extends it to 
object-oriented languages use subtype constraints instead of equal- 
ity constraints [10, 9]. 

Gagnon et al. [11] present a modular, constraint-based technique 
for inference of static types of local variables in Java bytecode; this 
analysis is typically unnecessary for bytecode generated from Java 
code, but is sometimes useful for bytecode generated from other 
sources. No polymorphic types are inferred, however. 

5.1 Generalisation for re-use 

There are two notable previous papers that use automated infer- 
ence of polymorphism where the application is source-code gener- 
alisation for re-use. 

Since the result is source code for human consumption, rather 
than deductions for later analysis or optimisation, one of the pri- 
mary goals is restricting the degree of polymorphism so that the 
results do not overwhelm the user. Typically, programs contain 
much more 'latent' polymorphism than that actually exploited by 
the program. 

Siff and Reps [25] aim to translate C to C-l~l-; they detect latent 
polymorphism in C functions designed for use with parameters of 
primitive type and generalise the functions into template functions 
to work over arbitrary types. Recall that in C-l~l- one can define 
arithmetic operators for class types. Their algorithm determines — 
and documents — the set of constraints imposed by the generalised 
function on its argument. (They give as an example the x^ func- 
tion pow ( ) , which is defined only for numbers but could be applied 
to any type for which multiplication is defined, such as Matrix or 
Complex.) Their work focuses exclusively on generic functions, 
not classes, and tries to detect latent reusability; in contrast, our 
work seeks to enforce stronger typing where reusability was in- 
tended by the programmer. Furthermore, C-l~l- templates need not 
typecheck; they operate by simple textual substitution, and only 
the resulting code need typecheck. Therefore, the problem is quite 
different. 

The most closely related research to ours is Duggan's constraint- 



based type analysis for inferring genericity in a Java-like language [9]. 
Duggan gives a modular (intra-class) constraint-based parameteri- 
sation analysis for a monomorphic OO kernel language called Mini- 
Java; the target is a polymorphic variant. Poly Java, that permits ab- 
stracting classes over type parameters. The translation makes some 
casts provably redundant. 

We extend Duggan's work in a number of ways. He does not ad- 
dress abstract classes or interfaces. He does no instantiation anal- 
ysis, nor doe he use client information to reduce genericity, so his 
discovered generic types are unusably over-generic. He assumes 
that within class C, all references to instances of class C have the 
same parameters as this. His type hierarchy is a forest of trees, 
each of which has exactly the same number of parameters on all 
classes within it. (Each tree inherits from Object() via a special- 
case rule.) Subclasses may neither add nor remove type parameters, 
and the number of parameters inferred for a tree of classes is based 
only on the class at the root of that tree. In Java, the generic type 
is rarely manifest in the base: most generic classes have relatively 
abstract superclasses. 

6. CONCLUSION 

6.1 Status and future work 

We are in the midst of implementing the algorithms described 
in this paper. Parts of the process are now automated, but for our 
experiments we performed other steps by hand. We have exper- 
imented with the algorithms on a suite of test classes that prove 
problematic for other approaches, and also on more realistic code, 
such as parts of the implementation itself. When the implementa- 
tion is complete, we plan to analyze larger codebases quantitatively 
and also to gain experience with use of the tool through case stud- 
ies. That will indicate, for example, whether there is need for ad- 
ditional techniques (either human-assisted or automatic) to refine 
the results of the parameterisation analysis. For example, is full 
unification of type parameters advantageous, or is the technique of 
Section 3.7 of little practical use? 

Finally, we would like to experiment how constraints are added 
to the instantiation analysis model. Currently, each new client adds 
more constraints, removing (unused) aspects of the generalisation. 
Rather than removing spurious aspects from the maximum gener- 
alisation, it would be interesting to compute the maximum general- 
isation, but to include in the results only as much of it as is actually 
used. The distinction is similar to that separating optimistic and 
pessimistic analyses. 

6.2 Contributions 

We have presented a constraint-based whole-program reverse en- 
gineering algorithm for inferring generic types from Java programs. 
The algorithm operates in two steps: first, it determines a generic 
type for each class declaration, and then it determines the actual 
types at which each use of the class is instantiated. The two al- 
gorithms together enable translating Java programs into semanti- 
cally equivalent GJ (Generic Java) programs with generic types. 
Preliminary investigation of the algorithms suggests that the result 
is usually ideal or close to ideal (that which an experienced Java 
programmer would have written). Automatically-inferred generic 
types for Java classes will permit programmers to enjoy the benefits 
of parametric polymorphism — such as machine-checkable docu- 
mentation of programmer intent, compile-time checking for errors, 
and reduced code clutter — at much lower cost than converting legacy 
code by hand. This is an attractive proposition with the upcoming 
release of Java 1.5. 

Our research improves previous work in several respects. First, 
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our research applies to real languages: it handles all Java language 
features, and it accounts for limitations and features of GJ. Ex- 
amples include handling of arrays and of relations between sub- 
classes and superclasses. A toy source or destination language for 
the translation would have simplified the algorithms, but would not 
have been as practical. 

Second, it uses context information from clients and from sub- 
classes to improve its results. For instance, client-side information 
can refine the parameterisation of classes by discovering patterns 
among all uses of a class in the particular application. Similarly, the 
algorithms obtain parameterisations for abstract classes by combin- 
ing information the concrete subclasses. And they refine superclass 
information based on constraints in subclasses. The use of con- 
text information eliminates unwanted generality and, due in part to 
self-uses, can produce ideal results even for a small or nonexistent 
application. 

Third, it handles many realistic special cases. Our algorithms 
accommodate the generic specialisation and extension that may ac- 
company inheritance. They do not assume that client uses that lie 
within the class being analyzed must use the same type parameters. 
They have limited handling of complex and recursive bounds. The 
algorithms are robust even in face of realistic casting scenarios such 
as overwidening, application invariants, and client errors. 

Fourth, it determines instantiation types for clients of paramet- 
rically polymorphic classes; the algorithms push genericity results 
back through the whole program. 

Fifth, it enables translation of (client and implementation) source 
code to a language with genericity, rather than (for example) pro- 
ducing a result only for optimization or for examination by humans. 

As a result of these improvements, our research produces cleaner 
abstractions than other approaches to the same or similar problems. 
We believe it to be a promising approach to migrating programmers 
towards parametric polymorphism. 
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